Midterm

Author

Christine Raj

Research Question: Do countries with higher percentage of females with their needs for family planning met with modern methods have lower neonatal mortality rates?

Introduction

The World Health Organization has made one of its sustainable development goals increasing the number of women who use modern family planning methods. Family planning methods are contraceptive methods that allow women to be sexually active without the worry of becoming pregnant. There are many reasons to increase the amount of women who use family planning including population control, increased maternal health, decreased number of unwanted children, and many more reasons. One reason that increasing family planning may be a positive thing is that it may allow the women to be better prepared for having a child and increase the amount of care the neonate can have access to.

This leads me to the question of whether countries with higher percentage of females with their needs for family planning met with modern methods have lower neonatal mortality rates. I will also be looking at whether family planning has an association with 5 year mortality as well as adolescent birth rates.

Libraries and Packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)
library(dplyr)
library(ggplot2)
library(lubridate)
library(leaflet)
library(readxl)
library(dtplyr)
library(data.table)

Attaching package: 'data.table'

The following objects are masked from 'package:lubridate':

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

The following objects are masked from 'package:dplyr':

    between, first, last

The following object is masked from 'package:purrr':

    transpose
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(knitr)
library(rnaturalearth)
The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
     (status 2 uses the sf package in place of rgdal)
Support for Spatial objects (`sp`) will be deprecated in {rnaturalearth} and will be removed in a future release of the package. Please use `sf` objects with {rnaturalearth}. For example: `ne_download(returnclass = 'sf')`
library(rnaturalearthdata)

Attaching package: 'rnaturalearthdata'

The following object is masked from 'package:rnaturalearth':

    countries110
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

Introduction

Adding the Data and Merging and Manipulating it

#getwd()
#setwd("/Users/christineraj/Documents/PM566/")

#uploading the seperate data sheets and getting rid of all the labels from the first 5 rows
whodata1labels <- data.table::fread("whs2023_annex1.csv")
labels1 <- whodata1labels[1:5, ]
whodata1 <- whodata1labels[-(1:5), ]
whodata2labels <- data.table::fread("whs2023_annex2.csv")
labels2 <- whodata2labels[1:5, ]
whodata2 <- whodata2labels[-(1:5), ]

# merging who data
merge(
  # Data
  x     = whodata1,      
  y     = whodata2, 
  # List of variables to match
  by.x  = "V1",
  by.y  = "V1", 
  # Which obs to keep?
  all.x = TRUE,      
  ) %>% nrow()
[1] 210
whodata <- merge(
  # Data
  x     = whodata1,      
  y     = whodata2, 
  # List of variables to match
  by.x  = "V1",
  by.y  = "V1", 
  # Which obs to keep?
  all.x = TRUE,      
  all.y = FALSE
  )
#head(whodata[, list(V1, V2.x, V3.y)], n = 8)
whodata <- whodata[-(1:4), ]

#merging labels for reference
#merge(
  # Data
 # x     = labels1,      
#  y     = labels2, 
  # List of variables to match
#  by.x  = "V1",
#  by.y  = "V1", 
  # Which obs to keep?
#  all.x = TRUE,  
#  allow.cartesian = TRUE
#  ) %>% nrow()

#labels <- merge(
  # Data
 # x     = labels1,      
#  y     = labels2, 
  # List of variables to match
 # by.x  = "V1",
#  by.y  = "V1", 
  # Which obs to keep?
 # all.x = TRUE,      
#  allow.cartesian = TRUE
#  )
#head(labels[, list(V1, V2.x, V3.y)], n = 5)

#renaming key variables for convenience
whodata <- whodata %>% rename(countries = V1)
whodata <- whodata %>% rename(year5mortality = V14.x)
whodata <- whodata %>% rename(neonatalmortality = V15.x)
whodata <- whodata %>% rename(familyplanning = V6.y)
whodata <- whodata %>% rename(adolescentbirths15_19 = V8.y)
whodata <- whodata %>% rename(adolescentbirths10_14 = V9.y)

Methods

In order to answer the question posed above, data from the World Health Organization (WHO) was obtained. The WHO obtains yearly statistics by country to see how countries are working towards the sustainable development goals that the WHO has set. The data is collected and organized by the WHO and they release the comparable estimates of each country for each sustainable development goal. For this project, I looked at the data by country and obtained each countries proportion of women who have their need for family planning met. This is the independent variable for this project. As the primary outcome I obtained the number of neonatal mortality by country per 1000 neonates. For secondary outcomes I looked at the number of 5 year mortality per 1000 people by country, and the adolescent birth rate of both adolescents aged 10-14 and 15-19. I uploaded this data from the WHO and started by ensuring that there were no labels to interfere with the analysis within the data set and removed all of those. I ensured that each variable was coded as a numeric variable except for the countries variable which was coded as a character. As the primary and independent variables did not have data from certain countries, I eliminated those countries from the data set as they would add no value to the analysis. I did not eliminate countries with unavailable data for the secondary outcomes as this was less important and I did not want to skew the data of the main outcome. The percentage of countries with family planning is skewed to the right somewhat with a range of 2.1 to 89.6. There are no outliers. Neonatal mortality is slightly skewed to the left with no outliers and a range of 1 to 39. 5 year mortality is skewed to the left with 3 outliers and a range of 2 to 115. Adolescent birth of girls aged 15-19 is skewed to the left with 2 outliers and a range of 2.4 to 184.4. Adolescent birth of girls aged 10-14 is heavily skewed to the left with 6 outliers and a range from 0.0 to 10.7. For data exploration I used boxplots and summary statistics to view the data. To answer the main question I used correlation statistics, scatter plots with summary statistic lines, and also an interactive scatter plot.

Going through the EDA checklist

Checking dimensions and headers and footers

#dim(whodata)
#head(whodata)
#tail(whodata)

#found out there is an extra row that is meant as a label for years that needs to be deleted so deleting this row and then rechecking footer
row_numbers <- which(whodata$countries == "WHO region")
whodata <- whodata[-row_numbers, ]
#tail(whodata)

Checking Variable Types

#checking the variable types
variable_types <- sapply(whodata, class)
#print(variable_types)
is_numeric <- is.numeric(whodata$familyplanning)
#print(is_numeric)

#Finding out that all the variables are coded as characters rather than numberic so changing all relevant variables to numeric and then rechecking all variable types
whodata$familyplanning <- as.numeric(whodata$familyplanning)
Warning: NAs introduced by coercion
whodata$year5mortality <- as.numeric(whodata$year5mortality)
Warning: NAs introduced by coercion
whodata$neonatalmortality <- as.numeric(whodata$neonatalmortality)
Warning: NAs introduced by coercion
whodata$adolescentbirths15_19 <- as.numeric(whodata$adolescentbirths15_19)
Warning: NAs introduced by coercion
whodata$adolescentbirths10_14 <- as.numeric(whodata$adolescentbirths10_14)
Warning: NAs introduced by coercion
#variable_types <- sapply(whodata, class)
#print(variable_types)

#Looking at relevant variables closer
#summary(whodata$familyplanning)
#summary(whodata$year5mortality)
#summary(whodata$neonatalmortality)
#summary(whodata$adolescentbirths15_19)
#summary(whodata$adolescentbirths10_14)

#eliminating NAs in primary variables only
whodata <- whodata[!is.na(whodata$familyplanning)]
whodata <- whodata[!is.na(whodata$neonatalmortality)]
#summary(whodata$familyplanning)
#summary(whodata$neonatalmortality)

Summary statistics

#summary statistics
cor(whodata$familyplanning, whodata$neonatalmortality, use="complete")
[1] -0.3445729
cor(whodata$familyplanning, whodata$year5mortality, use = "complete")
[1] -0.4189029
cor(whodata$familyplanning, whodata$adolescentbirths15_19, use = "complete")
[1] -0.2751139
cor(whodata$familyplanning, whodata$adolescentbirths10_14, use = "complete")
[1] -0.2838636

Exploratory Graphs

#exploratory graph
#boxplot(whodata$familyplanning)
#boxplot(whodata$neonatalmortality)
#boxplot(whodata$year5mortality)
#boxplot(whodata$adolescentbirths15_19)
#boxplot(whodata$adolescentbirths10_14)

Results

There is a moderate negative correlation between family planning and neonatal mortality. It has an R coefficient of -0.345. There is also a downward trend in the data when graphing it on a scatter plot by country. There is more of a change in effect of decreased neonatal mortality once over 50% of the population uses family planning methods. Before 50% there is a minimal effect of increasing modern family and the decrease in neonatal mortality. There is also a negative correlation between family planning and 5 year mortality. There is an R coefficient of -0.419. There is a steady downward trend between an increase in family planning and a decrease in 5 year mortality. There is very little negative correlation between family planning and adolescent birth rate of girls age 15-19. The R coefficient is -0.275. There is also a very small negative correlation between family planning and adolescent birth rate of girls age 10-14. The R coefficient is -0.284. For both adolescent birth rate of those ages 15-19 and 10-14 there does not seem to be a very strong downward trajectory of the smooth fit line in the scatter plot which makes it seem like there is not a large effect of family planning on adolescent mortality.

Publishable Graphs - Family Planning vs Neonatal Mortality

whodata %>% 
  ggplot() + 
  geom_point(mapping = aes(x = whodata$familyplanning, y = whodata$neonatalmortality)) + 
  stat_smooth(mapping = aes(x = familyplanning, y = neonatalmortality)) +
  labs(title = "Number of Neonatal Mortality by Percentage of Modern Family Planning Use in each Country") + 
  labs(x = expression("Percentage of Modern Family Planning"), y = "Number of Neonatal Mortality per 1000")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

interactiveplot <- plot_ly(data = whodata, x = ~whodata$familyplanning, y = ~whodata$neonatalmortality, text = ~whodata$countries, type = "scatter", mode = "markers") 

interactiveplot <- interactiveplot %>% layout(
  title = "Neonatal Mortality by Percentage of Modern Family Planning with Country Names",
  xaxis = list(title = "Percentage of Modern Family Planning"),
  yaxis = list(title = "Number of Neonatal Mortality per 1000")
)

interactiveplot

Publishable Graphs - Family Planning vs 5 year Mortality

whodata %>% 
  ggplot() + 
  geom_point(mapping = aes(x = whodata$familyplanning, y = whodata$year5mortality)) + 
  stat_smooth(mapping = aes(x = familyplanning, y = year5mortality)) +
  labs(title = "5 year Mortality by % of Modern Family Planning Use in Each Country") + 
  labs(x = expression("Percentage of Modern Family Planning"), y = "Number of 5 year Mortality per 1000")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

mortality5 <- plot_ly(data = whodata, x = ~whodata$familyplanning, y = ~whodata$year5mortality, text = ~whodata$countries, type = "scatter", mode = "markers") 

mortality5 <- mortality5 %>% layout(
  title = "5 Year Mortality by Percentage of Modern Family Planning with Country Names",
  xaxis = list(title = "Percentage of Modern Family Planning"),
  yaxis = list(title = "Number of 5 year Mortality per 1000")
)

mortality5

Publishable Graphs - Family Planning vs Adolescent (15-19) Births

whodata %>% 
  ggplot() + 
  geom_point(mapping = aes(x = whodata$familyplanning, y = whodata$adolescentbirths15_19)) + 
  stat_smooth(mapping = aes(x = familyplanning, y = adolescentbirths15_19)) +
  labs(title = "Adolescent Birth (15-19) by % of Modern Family Planning Use in Each Country") + 
  labs(x = expression("Percentage of Modern Family Planning"), y = "Adolescent Birth (age 15-19) per 1000")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 4 rows containing missing values (`geom_point()`).

adolescent1519 <- plot_ly(data = whodata, x = ~whodata$familyplanning, y = ~whodata$adolescentbirths15_19, text = ~whodata$countries, type = "scatter", mode = "markers") 

adolescent1519 <- adolescent1519 %>% layout(
  title = "Adolescent Birth 15-19 by Percentage of Modern Family Planning with Country Names",
  xaxis = list(title = "Percentage of Modern Family Planning"),
  yaxis = list(title = "Adolescent Births Age 15-19 per 1000")
)

adolescent1519
Warning: Ignoring 4 observations

Publishable Graphs - Family Planning vs Adolescent (10-14) Births

whodata %>% 
  ggplot() + 
  geom_point(mapping = aes(x = whodata$familyplanning, y = whodata$adolescentbirths10_14)) + 
  stat_smooth(mapping = aes(x = familyplanning, y = adolescentbirths10_14)) +
  labs(title = "Adolescent Birth (10-14) by % of Modern Family Planning Use in Each Country") + 
  labs(x = expression("Percentage of Modern Family Planning"), y = "Adolescent Birth (age 10-14) per 1000")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 15 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 15 rows containing missing values (`geom_point()`).

adolescent1014 <- plot_ly(data = whodata, x = ~whodata$familyplanning, y = ~whodata$adolescentbirths10_14, text = ~whodata$countries, type = "scatter", mode = "markers") 

adolescent1014 <- adolescent1014 %>% layout(
  title = "ADolescent Birth (10-14) by Percentage of Modern Family Planning with Country Names",
  xaxis = list(title = "Percentage of Modern Family Planning"),
  yaxis = list(title = "Adolescent Births Age 10-14 per 1000")
)

adolescent1014
Warning: Ignoring 15 observations

Table

usefulwhodata <- whodata[, c("familyplanning", "neonatalmortality", "year5mortality", "adolescentbirths15_19", "adolescentbirths10_14")]

summary_table <- usefulwhodata %>%
  gather(variable, value) %>%
  group_by(variable) %>%
  summarise(
    Mean = mean(value, na.rm = TRUE),
    Median = median(value, na.rm = TRUE),
    SD = sd(value, na.rm = TRUE),
    Min = min(value, na.rm = TRUE),
    Max = max(value, na.rm = TRUE)
  )
kable(summary_table, format = "html")
variable Mean Median SD Min Max
adolescentbirths10_14 1.54300 1.0 1.908633 0.0 10.7
adolescentbirths15_19 61.63153 52.8 42.832929 2.4 184.4
familyplanning 58.14783 59.5 19.591640 2.1 89.6
neonatalmortality 16.14783 14.0 10.010734 1.0 39.0
year5mortality 36.26957 27.0 28.825951 2.0 115.0

Maps

world <- ne_countries(scale = "medium", returnclass = 'sf')
mapdata <- merge(world, whodata, by.x = "name_long", by.y = "countries", all.x = TRUE)

mapdata %>%
  ggplot() +
  geom_sf(data = mapdata, aes(fill = familyplanning)) +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(title = "Family Planning by Country")

mapdata %>%
  ggplot() +
  geom_sf(data = mapdata, aes(fill = neonatalmortality)) +
  scale_fill_gradient(low = "lightgreen", high = "darkgreen") +
  labs(title = "Neonatal Mortality by Country")

Conclusion

There seems to be an association between increased family planning and decreased neonatal mortality. This may be the case because more people are able to choose when they want to have a baby and can be better prepared for taking care of that baby when it is born when there is more family planning. Another possible reason this could be true is because when there is increased family planning, women are more capable of planning when they want to have a baby and they can choose a time when they are better able to get prenatal care to ensure the health of the baby. There also seems to be an association between increased family planning and 5 year mortality. Most likely the same reasons apply to why this trend exists. One thing that was interesting was there did not seem to be a strong association between adolescent birth rate of girls age both 15-19 and 10-14 and family planning. This may be the case because family planning methods are probably not marketed to adolescents at the same level they are to adults. This can cause those who become sexually active earlier in life to potentially be more likely to not use family planning methods. It would be interesting to see the use of family planning methods based on age and see if most adolescents regardless of country do not use family planning methods which would explain the lack of association. One potential limitation of this study is that there might have been several countries that we eliminated that had no neonatal mortality data and family planning data. These countries could have strengthened our association because the countries which do not have the capabilities of gathering this data may more likely be from a lower income country and their access to family planning and health care is limited. Therefore, it might be likely that they had low rates of family planning and higher rates of neonatal mortality.